A simulation of me: fine-tuning an LLM on 240k text messages

I’m far from the first to think of this. Several people — perhaps inspired by creepy Black Mirror episodes — have tried to fine-tune an LLM on their SMS or WhatsApp history in an effort to create a simulation of themselves.

Generally results have disappointed; conversations tend to be mundane (it seems we don’t share our most profound insights by text), and quickly descend into loops.

I’ve been dying to have a try myself. Over the holiday break, I had a spare few moments and a $100 Xmas gift to blow on google compute, and I got to work between helpings of turkey.

The strategy

I’ll write a series of posts about how I approached it, but here’s the gist:

1. GET THE MESSAGES

I used a utility called iMazing to download all my SMS / iMessage and WhatsApp conversations. I filtered out group chats, people not in my contacts, and people I rarely message. This yielded 240,805 messages with 288 people.

2. PREPARE THE DATASET

I created my training and test datasets by grouping into chunks of messages with the same person, packing as many messages as possible into each chunk without exceeding 200 tokens. This resulted in 25,087 chunks of data, with each chunk containing a prompt and about 7-10 messages.

Here’s an example chunk with a more administrative interaction between me and my building’s handyman, with some details changed.

<<SYS>>Write a realistic text message chat. Avoid repetition.<</SYS>>
[INST]Write a chat between Edward and Josh[/INST]

### Josh: Thanks Ed will call you soon as I get back to work
### Josh: Hi Ed When you have time give me a call so we can discuss about paint
### Edward: Liked “Hi Ed When you have time give me a call so we can discuss about paint ”
### Edward: hey Josh. It’s Ed from 24B. Any chance you could do the painting this week? Wed or Thurs?
### Edward: i’m on a call sorry free in an hour
### Josh: Just call me when you have a chance
### Edward: hey Josh - just checking in - is Tom able to do the painting?
3. HYPER-PARAMETER SEARCH

I started with the 7B parameter Llama 2 and fine-tuned using QLoRA on V100 VMs. I found conflicting advice all over the internets on QLoRA hyper-parameters and ended up doing a pretty extensive search. Later I upgraded to 13B parameters with a marked improvement in results.

4. BACK TO THE DATA

Then I returned to step 2. I worked on better ways to format the training data and on improvements to the prompt. I was thrown off at first because the training and eval loss actually increased, but the results were absolutely improving.

5. GENERATION

Finally, I used Hugging Face’s Text Generation to write conversations, either playing the role of me, or one of my contacts, or acting as both sides. I experimented with penalties for repetition, token suppression, beam search and more.

Initial disappointment

My early attempts yielded similar results to my predecessors. The LLM would generate conversations that seemed to be surreal parodies of my life. My work colleagues would endlessly reschedule meetings earlier and later; my partner and I would relentlessly debate our lunch options. Efforts to encourage the model to be more diverse only resulted in increasingly unappetizing lunch alternatives.

But then..

To my enormous delight, persistence paid off, largely after working on the prompt, the input data format and the generation approach. The LLM has become highly effective at imitating me. To my surprise, it can also convincingly act as many of my friends — really anyone with whom I’ve exchanged at least 1,000 messages.

I’m startled by how real the conversations seem, and how they reflect the nuances of my different relationships. The model isn’t always on point – it still sometimes gets stuck in loops or goes awry – but I would say more than half of the conversations would pass a Turing Test in that I wouldn’t be able to tell you if real or generated. I’ve actually been so surprised that I’ve sometimes searched my text history in case the model is regurgitating real conversations (it’s not).

This is less surprising, but I think it’s really cute – for my contacts where English isn’t their first language, the model cruelly reproduces spelling and grammar mistakes and quirky expressions.

Examples

I’m not comfortable sharing generated conversations with actual contacts, so instead for this post I primed the LLM with messages from fictional people (in orange below) and let it imagine how the conversation might proceed. I must admit that the performance was considerably poorer; without more context to go on, the model quickly fell into its comfort zone of meeting logistics. I have ideas of how to improve this by providing more of a backstory in the prompt.

### Edward: Hi David, this is Ed. We met at yesterday's AI conference. You were interested in hearing more about our matching models and perhaps trying our product.
### David: Hey Ed, I am very interested in the product, I have a project that I am working on and I would love to learn more about your matching models.
### Edward: Sure, I'm free for the rest of today if you are. We can talk about it in more detail.
### David: Ok, let's talk then.
### David: Hi Ed, I am available now, I am at home and I am free to talk about the product.
### Edward: 2 mins
### David: Ok
### Edward: Hey Lucy! This is Edward. It was great to bump into you yesterday - I don't think we've spoken since university! How are you?
### Lucy: Hey Edward! I'm great! I'm working at a restaurant and I'm doing well. I'm going to a festival in August and I'm going to London in September!
### Edward: that's fantastic! what's the festival? where in London?
### Lucy: It's called the Boomtown festival. I'm going for a weekend. In London I'm going to see my friend and go to some museums.
### Edward: 2013 is a big year for me - I'm getting married in Oct! I'm also in London in September, I'll be there for 2 weeks, I'd love to see you if you're around
### Lucy: 2 weeks! Wow! I'm going to be in London from the 13th to the 21st! We should meet!
### Edward: 13th - 15th I'm free, then I'm in Paris for a weekend, then back in London 17th - 21st, any chance you'll be there?
### Lucy: 13-15 and 17-21! I'm going home on the 19th. I'll be there!
### Edward: 21st is my last day, but if you're still around I'd love to meet you

Here’s a more intriguing example of the model imitating me; again text in orange has been provided to the model and it supplied the responses:

### Edward: Hey
### Lucas: Hey I have a question for you. Tell me a secret, something I don't know about you.

### Edward: 1. I'm a terrible dancer. 2. I'm a bit of a nerd.
### Lucas: OK I know both those things. Tell me a deep secret that nobody knows about you.
### Edward: 3. *** Deeper secret redacted!!! ***

My deeper secret is nothing too outrageous, but it’s at least of the “I could tell you but then I’d have to kill you” variety, so probably best not to publish here…

Not going to go there

This post could take a rather macabre turn at this point. A very close friend of mine died about 10 years ago, and we have a long text message history together. You don’t need to watch Black Mirror to know where I’m heading with this. But you’ll be relieved to hear that I recognized this would be in extremely poor taste and removed the data. Enough on that topic!

EDIT: Hacker News reader wyldfire pointed out that there could be a more palatable angle, questioning whether “LLMs could advance to a point where it might be therapeutic to have a conversation with a deceased loved one.” Much discussion followed.

What’s next

I’ll be writing a series of posts on my approach so that others can experiment. Here is part 1, followed by part 2. I’d love to hear how you get on and help with any roadblocks. I learned so much on this journey, and very much enjoyed the experience, and I hope you will too.

I’m excited to try RAG and other techniques to give the model more context on its conversations. I expect another step change in quality of conversations. I look forward to the day that the model can fully replace me in replying to all my text messages..

I’m sure there’s tons more mileage in improving my LLM. I’m going to try other base models, further refinement of the prompt and input data, and more work on the generation. Most importantly, it goes without saying: with every day that goes by, my training dataset grows a little larger!

5 responses to “A simulation of me: fine-tuning an LLM on 240k text messages”

  1. Any chance youd be willing to open source some of this code?

    1. For sure – I’ll be writing posts to describe my journey and open sourcing the code. I might not open source my text history though 😂

  2. This reminds me of an episode in Season 3 of Westworld where Arron Paul’s character Caleb has a subscription to a service that allows him to talk to an old friend who it turns out died years ago. That show is brilliant, so many philosophical ideas presented in it about free will and what makes humans unique as we get closer to AGI.

    1. Yes – and several other examples in the Hacker News discussion, including Ubik by Philip K. Dick and Neuromancer. https://news.ycombinator.com/item?id=38847581

  3. Immensly interesting, looking forward to more post related to you personal experiments with the LLM’s potential

Leave a Reply

Discover more from Edward Donner

Subscribe now to keep reading and get access to the full archive.

Continue reading